-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13184][SQL] Add a datasource-specific option minPartitions in HadoopFsRelation#options #13320
Conversation
I was thinking actually just use the same config key name, and override it. Can you read my comment on the jira ticket and let me know if that's clear? |
oh, sorry. I'll check and fix it. |
Test build #59353 has finished for PR 13320 at commit
|
@rxin Could you check this again to satisfy your intention? |
Test build #59388 has finished for PR 13320 at commit
|
I'm going to mark this for 2.1 given we are too close to 2.0 to modify behaviors like this. |
Jenkins, retest this please. |
Test build #59586 has finished for PR 13320 at commit
|
Test build #61555 has finished for PR 13320 at commit
|
@@ -67,9 +67,6 @@ private[sql] abstract class BaseWriterContainer( | |||
// This is only used on driver side. | |||
@transient private val jobContext: JobContext = job | |||
|
|||
private val speculationEnabled: Boolean = | |||
relation.sparkSession.sparkContext.conf.getBoolean("spark.speculation", defaultValue = false) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted because this line not used.
Test build #61566 has finished for PR 13320 at commit
|
Test build #61577 has finished for PR 13320 at commit
|
Test build #62473 has finished for PR 13320 at commit
|
Test build #64052 has finished for PR 13320 at commit
|
Test build #64077 has finished for PR 13320 at commit
|
@rxin could you give me comments? |
Test build #68863 has finished for PR 13320 at commit
|
@gatorsmile Could you check this and give me comments, too? |
Test build #77932 has finished for PR 13320 at commit
|
// Returns all option set in the `SparkConf`, the `SQLConf`, and a given data source `options`. | ||
// If the same keys exist, they are overridden with ones in the `options`. | ||
private def optionsOverriddenWith(options: Map[String, String]): Map[String, String] = { | ||
sparkSession.sparkContext.conf.getAllAsMap ++ sparkSession.conf.getAll ++ options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we just cover the code path for DataFrameReader APIs.
To fundamentally resolve this issue, we need to consider all the read/write paths. A lot of works to do for achieving it. Do you want to continue the work or just close it? |
Ah, ok. I'll check again. |
@maropu, I just wonder if it is in progress in any way. |
No, so I'll close this for now and we move this discussion to jira. |
### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This pr adds a new option
minPartitions
accepted inHadoopFsRelation#options
.This is a datasource-specific value of the minimal splitting number for input data.
How was this patch tested?
Added unit tests in
FileSourceStrategySuite
.